In this vignette, we explore how to label your track files (activity and pressure) and provide tips to make the exercise more efficient. To see where this exercise fits in with the overall process, see the vignette How to use GeoPressureR.
library(GeoPressureR)
library(raster)
library(plotly)
library(RColorBrewer)
pam_data = pam_read(pathname = system.file("extdata", package = "GeoPressureR"),
crop_start = "2017-06-20", crop_end = "2018-05-02")Motivation
The most important reason motivating manual editing is that pressure mapping relies on precise activity and pressure data. Activity labeling defines stationary periods and flight duration. Short stationary periods can be particularly hard to define, such that expert knowledge is essential. Since flight duration is the key input in the movement model, having an accurate flight duration is critical to correctly estimate the distance traveled by the bird between two stationary periods. The pressure timeseries matching algorithm is highly sensitive to erroneously labeled pressure, such that even a few mislabeled datapoints can throw off the estimation map.
Each species’ migration behaviour is so specific that manual editing remains the fastest option. You can expect to spend between 30sec (e.g. Mangrove Kingfisher) to 10min (e.g. Eurasian Nightjar) per track depending on the species’ migrating complexity.
Manual editing also provides a sense of what the bird is doing. You will learn how the bird is moving (e.g. long continuous high altitude flight, short flights over multiple days, alternation between short migration flights and stopovers, etc.). It also provides a sense of the uncertainty of your classification, which is useful to understand and interpret your results.
That being said, it is still worth starting the manual editing from an automatically labeled timeseries. pam_classify() defines migratory flight when activity is hight for a long period. Refer to possible classification methods on the PALMr manual.
Basic labeling principles
The procedure involves labeling (1) migratory activity as 1 and (2) identifying pressure datapoints to be discarded from the matching exercise with 1.
The outcome of the activity labeling is twofold:
- defined stationary periods, during which the bird is considered static relative to the size of the grid (~10-30km). The start and end of the stationary period is then used to define the pressure timeseries to be matched.
- defined flight durations, which is used in the movement model to define the distance between stationary periods.
Labeling of pressure allows to deals with situation when the bird is changing altitude. Indeed, since the reanalysis data to be match with is provided at ground level, we want the pressure timeserie of the geolocator to be at a single elevation and must hence discard any datapoint from a different altitude.
Introduction to TRAINSET
We are suggesting to use TRAINSET, a web based graphical tool for labeling time series. You can read more about TRAINSET on www.trainset.geocene.com and on their Github.
The tool interface is quite intuitive. Start by uploading your .csv file (e.g., 18IC_act_pres.csv).

View after uploading a file
A few tips:
-
Keyboard shortcuts can considerably speed up navigation (zoom in/out, move left/right) and labeling (add/remove a label), specifically with
SHIFT. - Because of the large number of datapoints, keeping a narrow temporal window will avoid your browser from becoming slow or irresponsive.
- You can change the “Reference Series” to pressure to see both timeseries at the same time which is helps interpret what the bird is doing.
- Play with the y-axis range to properly see small pressure variations which may not be visible at full range.
- TRAINSET is offers more flexibility with the label than required: you can add and remove label values (bottom-right of the page). In order for
trainset_read()to work, do not change/edit/add any label, simply use the ones offered :0and1.
Four tests to check labeling
To assess the quality of your labeling, you can use this script comprising of four basic tests.
Test 1: Duration of stationary periods and flights
The first test consits in checking the durations of flights and stationary periods.
pam_data = trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v1.csv")
pam_data = pam_sta(pam_data)
knitr::kable(pam_data$sta[difftime(pam_data$sta$end,pam_data$sta$start, units = "mins")<60 | pam_data$sta$next_flight_duration<30,])| start | end | duration | next_flight_duration | sta_id | |
|---|---|---|---|---|---|
| 7 | 2017-08-30 23:45:00 | 2017-08-30 23:55:00 | 10 mins | 255 mins | 7 |
| 27 | 2018-04-15 19:30:00 | 2018-04-15 20:10:00 | 40 mins | 85 mins | 27 |
| 30 | 2018-04-29 23:35:00 | 2018-04-29 23:45:00 | 10 mins | 170 mins | 30 |
| 32 | 2018-04-30 19:20:00 | 2018-04-30 19:40:00 | 20 mins | 125 mins | 32 |
| 33 | 2018-04-30 21:45:00 | 2018-04-30 21:55:00 | 10 mins | 65 mins | 33 |
| 34 | 2018-04-30 23:00:00 | 2018-04-30 23:10:00 | 10 mins | 50 mins | 34 |
| 35 | 2018-05-01 00:00:00 | 2018-05-01 00:10:00 | 10 mins | 35 mins | 35 |
| 36 | 2018-05-01 00:45:00 | 2018-05-01 23:30:00 | 1365 mins | 0 mins | 36 |
You may want to check labeling of flights shorter than a 1 hours and labeling before and after stationary periods shorter than a couple of hours. Using the exact times from the table above, you can edit your labeling in TRAINSET and export a new version of the csv file. Note that the last row has a next_flight_duration of 0 because it is the last stationary period.
Test 2: Pressure timeseries
The second check to carry out before computing the map is to visualize the pressure timeseries and their grouping into stationary periods.
pam_data = trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v2.csv")
pam_data = pam_sta(pam_data)
p <- ggplot() +
geom_line(data = pam_data$pressure, aes(x=date,y=obs),col="grey") +
geom_line(data = subset(pam_data$pressure, sta_id != 0),
aes(x=date,y=obs,col=as.factor(sta_id))) +
theme_bw() +
scale_colour_manual(values=rep(RColorBrewer::brewer.pal(9,"Set1"),times=4))
#scale_colour_brewer(type="qualitative", palette = 'Set1')
ggplotly(p, dynamicTicks = T) %>%
layout(showlegend=F,
legend = list(orientation = "h", x = -0.5),
yaxis = list(title="Pressure [hPa]"))Ploting this figure with PlotlyR allows you to zoom-in and pan to check all timeseries are correctly grouped. Make sure each stationary period does not include any pressure measurement from flight (e.g. 1-Sep-2019). You might spot some anomalies in the temporal variation of pressure. In some cases, you can already label the pressure timeseries to remove them.
Test 3: Pressure timeserie match
So far we have checked that the pressure timeseries are correctly labeled with their respective stationary periods, and that they look relatively smooth. At this stage, the timeserie are good enough to be match with the reanalyis data. The third test consists of finding the location with the best match and comparing the pressure timeseries. This allows to distinguish bird movements from natural variations of the pressure. This is by far the more difficult step and multiple iteration will be necessary to get the best result.
As computation can takes some time, we recommend starting with a few long stationary periods, and once results are satisfying, moving to the shorter periods.
pam_data = trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v3.csv")
pam_data = pam_sta(pam_data)
sta_id_keep = pam_data$sta$sta_id[difftime(pam_data$sta$end,pam_data$sta$start, units = "hours")>12]
pam_data$pressure$sta_id[!(pam_data$pressure$sta_id %in% sta_id_keep)] = NA
message("Number of stationay period to query: ",length(sta_id_keep))We can estimate the probability map for each stationary periods
raster_list = geopressure_map(pam_data$pressure, extent=c(-16,20,0,50), scale=10, max_sample=100)
prob_map_list = geopressure_prob_map(raster_list)For each stationary period, we locate the best match and query the pressure timeseries with geopressure_ts() at this location. If you get errors, check the probability map and the best match (see commented line starting with leadlet())
ts_list=list()
for (i_r in 1:length(prob_map_list)){
i_s = metadata(prob_map_list[[i_r]])$sta_id
# find the max value of probability
tmp = as.data.frame(prob_map_list[[i_r]],xy=T)
lon = tmp$x[which.max(tmp[,3])]
lat = tmp$y[which.max(tmp[,3])]
# Visual check
# leaflet() %>% addTiles() %>% addRasterImage(prob_map_list[[i_r]]) %>% addMarkers(lat=lat,lng=lon)
# query the pressure at this location
message("query:",i_r,"/",length(prob_map_list))
ts_list[[i_r]] = geopressure_ts(lon,
lat,
pressure = subset(pam_data$pressure,sta_id==1))
# Add sta_id
ts_list[[i_r]]['sta_id'] = i_s
# Remove mean
ts_list[[i_r]]$pressure0 = ts_list[[i_r]]$pressure - mean(ts_list[[i_r]]$pressure) + mean(pam_data$pressure$obs[id])
}We can now look at a similar figure of pressure timeseries, but this time comparing the geolocator data with the best match from the reanalysis data.
p <- ggplot() +
geom_line(data=pam_data$pressure, aes(x=date,y=obs), colour="grey") +
geom_point(data=subset(pam_data$pressure, class), aes(x=date,y=obs), colour="black") +
geom_line(data=do.call("rbind", ts_list), aes(x=date,y=pressure0,col=as.factor(sta_id))) +
theme_bw() +
scale_colour_manual(values=rep(RColorBrewer::brewer.pal(9,"Set1"),times=4))
ggplotly(p, dynamicTicks = T) %>%
layout(showlegend=F,
legend = list(orientation = "h", x = -0.5),
yaxis = list(title="Pressure [hPa]"))You can use this figure to identify periods where the mismatch indicates a problem with the labeling. Often, it will indicates that the bird was moving altitude. This happens regularly on migration where the bird land in one location and performs a one or two short flight in the morning, changing altitude. Activity data on TRAINSET can also help understanding what the bird is doing.
Test 4: Histogram of pressure error
Finally, you can also look at the histogram of the pressure error (geolocator-ERA5). For long stationary period (~>5 days), you want to checkt that there is a singlemode in your distribution. Two modes indicates that the bird is spending time at two different altitude. This is usual when bird have a day site and a night roost at different elevation. You might also want to notice the spread of the distribution. This value can guide you in setting the standard deviation parameter s in geopressure_prob_map().
par(mfrow = c(5,6), mar=c(1,1,3,1))
for (i_r in seq_along(ts_list)){
i_s = unique(ts_list[[i_r]]$sta_id)
df3 <- merge(ts_list[[i_r]], subset(pam_data$pressure, !class & sta_id==i_s), by = "date")
df3$error = df3$pressure0-df3$obs
hist(df3$error, main = i_s, xlab="", ylab="")
abline(v = 0, col="red")
}
Common challenges and tips to address them
In the following section, we use examples to illustrate common challenges that may be encountered during manual editing, and suggestions on how to address them.
Outliars during flights due to low bird activity
Birds can have periods of low activity during their flight (e.g., less flapping). In those cases, the automatic labeling of activity with the KNN classifier mislabels these points as stationary periods, as illustrated in this example below for the night of the 31st of August. A single mislabeled point can incorrectly split the flight into multiple short flights. This error will be highlighted bytest 1 described above. However, birds might also display lower activity at the beginning or end of their flight, which are often mis-classified as illustrated all three night in the figure above. In such case, test 1 will not be able to pick them up.

But if the low activity happen well before the bird reach the ground as illustrated in the exemple below, it will be visible in the figure of test #2. However this is not always the case and we must assess on a case-by-case basis whether this should be included in the flight or not.

Make sure you zoom in before you edit outliers!
Anomlies in a pressure timeseries might not be obvious at first.

But if you zoom-in to narrower pressure range, you will see what is happening. This is a Tawny pipit breeding near a mine site white accidentaly topography. While breeding it look like staying at a realative constant elevation, but toward the end, you can see suddent drop in pressure indicating that the bird changed altitude.

In such case, you aim at labeling all pressure dapoint recorded while the bird was at a different altitude. Sometime it if not obvious what is temporal variation of pressue and what is due to the bird changing altitude. In such case, keep only the datapoint that you are confident (first part of the timeseries) and run test 3.

With a long time serie as this one, it will easily pick up the right location and the timeseries that you want to match. There you simply have to de-label de datapoint at the end of your timeseries that fit the ERA5 green line. For shorter timeserie, you might need several iteration to pick up the correct match.
Short stationary halts between flights
Interpreting bird behaviour and defining stationary periods can be difficult, for example when birds extend their migration into the day but with loyer intensity so that there is no clear end.

In other case, the bird stop for a couple of hour and then seem to be active afaint. Could be low-intensity migratory movement, a short break followed by more migratory flight, or landing at the stopover location, but relocating early morning with the light

The question is whether to label these breaks as stationary periods or not.
Referring to the pressure timeseries can help assess whether the bird changes location. For example, if the low activity is followed by high activity accompanied by pressure change, we can consider that the bird then changed location, and label the low activity as a stationary period.
However, the bird may also land and then complete local flights within its stopover location (with very little pressure variation), in which case we do not want to create two different stationary periods.
Test 3 will be essential to insure that no local vertical movement happened. Use the reanalysis data to find the best match.

Mountainous species
Mountainous species will display very specific behaviour with regular altitudinal changes.
This is very clear with the Ring Ouzel’s timeseries, with daily recuring movement, but no regular enough to make the process automatic and sometimes changing altitude. Choose which datapoint to keep and those to discard might not always be easy. Both the 790hPa and 900hPa might work.
It’s often a good idea to zoom back on the time axis to see if a certain elevation seems more commonly used. Then proceed similaryl to the Tawny pipit case with an iterative manner to keep only the datapoint at the sams elevation. Test 4 is often quite useful to make sur you haven’t forgotten some datatpoint


The Eurasian Hoopoe is a bit more difficult because it’s moving more continously through the day showing a more sinosoidal pattern.
This is the most difficult case as you really can’t distinguisth temporal varation from altitude.
After some iteration, you’ll end-up with something relatively correct. Note that to estimate the uncertainty correctly for such case, you will have to increase the standard deviation s. Howver, this behaviour is luckyly restricted to its breeding ground.

In some case, find a single timeserie is too difficult. This is the case for the wintering site of this Ring Ouzel, never returning to the same elevation. In such case, you can discard the entire timeserie and only use the mask of absolute value of pressure.

Luckily, mountainous species lives in rather narrow area (moutain), and in this case, it it easy from previous stationay period that it was in Maroc, and with so low pressure (high elevation), only the atlas mountain fits the criteria of threashold.


Accurate classification of flight duration can be difficult when the bird migrates with less intensity at the end the flight. Refer to the pressure timeseries to help you define the beginning and end of flight stages. Here, at the end of its nocturnal flight on 30 August, the bird may have completed shorter flights at its stopover location. These points should not be included in the stationary period.

Defining exaclty stationary period for some species can be difficult (here XX) with activities which could be low-intensity migratory movement or long non-migratory acitivity (feeding), or anything in between!
There will be situation were certain classification of activity is not possible. It is worth reminding that the labelisation of activity is two-fold: - Define flight duration, which will be used in the movement model and ultimately have the strongest impact on (1) the estimation of the position of short stopver between long flight (i.e, how ) and (2) estimation of fight speed when the position of the bird is well constrain. Ultimately a few datapoint more or less won’t have strong impact on long flight. But estimation of short movement can be relatively tricky. To partially accommodate for this, we compute an effort_duration for each flight, which normalize the duration of migratory flight by the intensity of the activity over the entire journey of the bird. - Define stationary period, which will be used to in the pressure timeserie matching.
At this stage, it is very useful to add pressure timeserie to understand the implication of defining stationary period on pressure timeseries

Although this Red-capped Robin Chat were not too active during this moring, you can notice drop of pressure after 9PM while similar level of activity before 9am on the next day don’t affect the pressure time series.
I think it’s best to think of stationary period, as period were the pressure timeserie is continuous enough to be able to match on the map.
A balence need to the found between creating enough stationary period to account for all position of the bird able to be estimated and creating too many stationary period, where you loose the duration of the timeserie able to match. This is important, because we are looking in creating long timeserie of pressure containing sufficient temporal varation, but not variation which are due to local/short movement (often because of latitude varition.
So, one option we have is to label activity to create new stationary period. The other option to avoid having to create too many stationary period is to label pressure time serie as outliar. These datapoint won’t be used in the match of the timeserie.

Here is a possible way to handle the Red-capped Robin-chat example. Tightening the pressure y axis while increase the time x axis allows to better see the generally smooth natural temporal varation of pressure that we want to capture. The fine-scale temporal variation of pressure can then be attributated to bird local movement (e.g. foraging in area with topographical varation). My propostion here is to create a new stationay period of a couple of hours and then mark pressure variation up to the 13th as outliar due to too much variation as well as the varation around 9am.
Future improvements
A lot can be do to improve this process:
- Run trainset offline.
- By-pass the create csv, uplad csv, read csv by runing a browser session directly in R
- Building a R (shiny) equivalent of Trainset to be directly integrated with the R package. Problem: can’t find a good package to label point in a figure in R, would have to maintain it while trainset it doing that for free.
- Any suggestions? Write an issue on Gitub